System accident

A system accident is an "unanticipated interaction of multiple failures" in a complex system. This complexity can either be technological or organizational, and often is both.^[1]

A system accident can be very easy to see in hindsight, but very difficult to see in foresight. Ahead of time, there are simply too many possible action pathways to seriously consider all of them.

These accidents often resemble Rube Goldberg devices in the way that small errors of judgment, flaws in technology, and insignificant damages combine to form an emergent disaster. System accidents were described in 1984 by Charles Perrow, who termed them "normal accidents", as having two main characteristics: interactive complexity and tight coupling. James T. Reason extended this approach with human reliability^[2] and the Swiss cheese model, now widely accepted in aviation safety and healthcare.

Once an enterprise passes a certain point in size, with many employees, specialization, backup systems, double-checking, detailed manuals, and formal communication, employees can all too easily recourse to protocol, habit, and "being right." Rather like attempting to watch a complicated movie in a language one is unfamiliar with, the narrative thread of what is going on can be lost. And other phenomena, such as groupthink, can also be occurring at the same time, for real-world accidents of course almost always have multiple causes, and not just the single cause that could have prevented the accident at the very last minute. In particular, it is a mark of a dysfunctional organization to simply blame the last person who touched something.

The processes of formalized organizations are often largely opaque. Perrow call this "incomprehensibility."

There is an aspect of an animal devouring its tail, in that more formality and effort to get it just right can actually make the situation worse. The more organizational rigmarole involved in adjusting to changing conditions, the more employees will delay reporting the changing conditions. The more emphasis on formality, the less likely that employees and managers will engage in real communication. And new additional rules can actually make it worse, both by adding an additional new layer of complexity and by telling employees yet again that they are not to think, but are instead simply to follow rules.^[3]

1 Possible system accidents
2 References
3 Further reading

Possible system accidents

Apollo 13, 1970

For more details on this topic, see Apollo 13.

It was found that the accident was not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design.

c. In addition, it is probable that the tank contained a loosely fitting fill tube assembly. This assembly was probably displaced during subsequent handling, which included an incident at the prime contractor's plant in which the tank was jarred.

f. The special detanking procedures at KSC subjected the tank to an extended period of heater operation and pressure cycling. These procedures had not been used before, and the tank had not been qualified by test for the conditions experienced. However, the procedures did not violate the specifications which governed the operation of the heaters at KSC.

h. A number of factors contributed to the presence of inadequate thermostatic switches in the heater assembly. The original 1962 specifications from NR to Beech Aircraft Corporation for the tank and heater assembly specified the use of 28 V dc power, which is used in the spacecraft. In 1965, NR issued a revised specification which stated that the heaters should use a 65 V dc power supply for tank pressurization; this was the power supply used at KSC to reduce pressurization time. Beech ordered switches for the Block II tanks but did not change the switch specifications to be compatible with 65 V dc.

l. As shown by subsequent tests, failure of the thermostatic switches probably permitted the temperature of the heater tube assembly to reach about 1000° F in spots during the continuous 8-hour period of heater operation. Such heating has been shown by tests to severely damage the Teflon insulation on the fan motor wires in the vicinity of the heater assembly. From that time on, including pad occupancy, the oxygen tank no. 2 was in a hazardous condition when filled with oxygen and electrically powered.

m. It was not until nearly 56 hours into the mission, however, that the fan motor wiring, possibly moved by the fan stirring, short circuited and ignited its insulation by means of an electric arc. The resulting combustion in the oxygen tank probably overheated and failed the wiring conduit where it enters the tank, and possibly a portion of the tank itself.^[4]

Three Mile Island, 1979

For more details on this topic, see Three Mile Island accident.

The 1979 Three Mile Island accident inspired Perrow's Normal Accidents book, where a nuclear accident occurs, resulting from an unanticipated interaction of multiple failures in a complex system. TMI was an example of a normal accident because it was "unexpected, incomprehensible, uncontrollable and unavoidable".^[5]

Perrow concluded that the failure at Three Mile Island was a consequence of the system's immense complexity. Such modern high-risk systems, he realized, were prone to failures however well they were managed. It was inevitable that they would eventually suffer what he termed a 'normal accident'. Therefore, he suggested, we might do better to contemplate a radical redesign, or if that was not possible, to abandon such technology entirely.^[6]

When systems exhibit both "high complexity" and "tight coupling", as at Three Mile Island, the risk of failure becomes high. Worse still, according to Perrow, "the addition of more safety devices -- the stock response to a previous failure -- might further reduce the safety margins if it adds complexity".^[6]

ValuJet 592, Everglades, 1996

For more details on this topic, see ValuJet Flight 592.

Step 2. The unmarked cardboard boxes, stored for weeks on a parts rack, were taken over to SabreTech's shipping and receiving department and left on the floor in an area assigned to ValuJet property.
Step 3. Continental Airlines, a potential SabreTech customer, was planning an inspection of the facility, so a SabreTech shipping clerk was instructed to clean up the work place. He decided to send the oxygen generators to ValuJet's headquarters in Atlanta and labelled the boxes "aircraft parts". He had shipped ValuJet material to Atlanta before without formal approval. Furthermore, he misunderstood the green tags to indicate "unserviceable" or "out of service" and jumped to the conclusion that the generators were empty.

Step 4. The shipping clerk made up a load for the forward cargo hold of the five boxes plus two large main tires and a smaller nose tire. He instructed a co-worker to prepare a shipping ticket stating "oxygen canisters - empty". The co-worker wrote, "Oxy Canisters" followed by "Empty" in quotation marks. The tires were also listed.
Step 5. A day or two later the boxes were delivered to the ValuJet ramp agent for acceptance on Flight 592. The shipping ticket listing tires and oxygen canisters should have caught his attention but didn't. The canisters were then loaded against federal regulations, as ValuJet was not registered to transport hazardous materials. It is possible that, in the ramp agent's mind, the possibility of SabreTech workers sending him hazardous cargo was inconceivable&nbsp.^[7]

References

^ Perrow, Charles (1984). Normal Accidents: Living with High-Risk Technologies, With a New Afterword and a Postscript on the Y2K Problem, Princeton, New Jersey: Princeton University Press, ISBN 0691004129, 1984, 1999 (first published by Basic Books 1984).
^ Reason, James (1990-10-26). Human Error. Cambridge University Press. ISBN 0521314194.
^ Langewiesche, William (March 1998). The Lessons of Valujet 592, The Atlantic. See especially the last three paragraphs of his article: “ . . . can load the structure with redundancies, but on the receiving end there comes a point—in the privacy of a hangar or a cockpit—beyond which people rebel. . . ”. And in the next paragraph, “ . . . an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls. Such pretend realities extend even into the most self-consciously progressive large organizations, with their attempts to formalize informality, to deregulate the workplace, to share profits and responsibilities, to respect the integrity and initiative of the individual. The systems work in principle, and usually in practice as well, but the two may have little to do with each other. Paperwork floats free of the ground and obscures the murky workplaces where, in the confusion of real life, system accidents are born. . . ”
^ REPORT OF APOLLO 13 REVIEW BOARD ("Cortright Report"), Chair Edgar M. Cortright, CHAPTER 5, FINDINGS, DETERMINATIONS, AND RECOMMENDATIONS, see pages 5-1 through 5-3. See also Apollo 13 Review Board which has the table of contents for the entire report.
^ Perrow, C. (1982), ‘The President’s Commission and the Normal Accident’, in Sils, D., Wolf, C. and Shelanski, V. (Eds), Accident at Three Mile Island: The Human Dimensions, Westview, Boulder, pp.173–184.
^ ^a ^b Nick Pidgeon (22 September 2011 Vol 477). "In retrospect:Normal accidents". Nature.
^ Stimpson, Brian (October 1998). "Operating Highly Complex and Hazardous Technological Systems Without Mistakes: The Wrong Lessons from ValuJet 592" (reprint). Manitoba Professional Engineer. Archived from the original on 2007-09-27. http://web.archive.org/web/20070927004115/http://www.cns-snc.ca/branches/manitoba/valujet.html. Retrieved 2008-03-06.